flowchart TD
A{{"Model (Features of unobserved population)"}} === B{{"Inquiries (Quantities of interest)"}}
B === C{{"Data strategy (Sampling strategies used)"}}
C === D{{"Answer strategy (Estimation methods used)"}}
A Meta-Analysis of PRIF Studies
WZB Berlin
WZB Berlin
May 23, 2023
hiddenmeta [R] packageQuestion: What are the relative biases of different methods for estimation of prevalence and size of hard-to-reach populations?
Setting: 8 PRIF studies of prevalence of various types of human-trafficking around the world using somewhat harmonized designs
| Implementing organization | Location | Hard-to-reach group (numerator) | Population (denominator) |
|---|---|---|---|
| Freedom Fund (FF) | Brazil (Recife) | Women 18-21 who were sex workers before age 18 | Women in age 18-21 who are sex workers |
| Stanford University (Stanford) | Brazil | Forced labor in agriculture sector | Employed in agriculture sector |
| New York University (NYU) | Costa Rica | Forced labor in fishing industry | Employed in fishing industry |
| New York University (NYU) | Tanzania (Dar es Salaam, Iringa, and Zanzibar) | Domestic servitude | Domestic workers |
| NORC | Morocco | Domestic servitude | Domestic workers |
| Johns Hopkins University (JHU) | Pakistan (Sindh) | Forced labor in brick kilns industry | Employed in brick kilns industry |
| University of Massachusetts, Lowell (UMass) | Tunisia (Tunis) | Domestic servitude | All domestic workers |
| RTI | USA | Forced labor in construction industry | All construction workers |
Question: What are the relative biases of different methods for estimation of prevalence and size of hard-to-reach populations?
Setting: 8 PRIF studies of prevalence of various types of human-trafficking around the world using somewhat harmonized designs
Approach: Use MIDA to provide both meta-analysis of PRIF studies and simulations for study design diagnosis
flowchart TD
A{{"Model (Features of unobserved population)"}} === B{{"Inquiries (Quantities of interest)"}}
B === C{{"Data strategy (Sampling strategies used)"}}
C === D{{"Answer strategy (Estimation methods used)"}}
Question: What are the relative biases of different methods for estimation of prevalence and size of hard-to-reach populations?
Setting: 8 PRIF studies of prevalence of various types of human-trafficking around the world using somewhat harmonized designs
Approach: Use MIDA to provide both meta-analysis of PRIF studies and simulations for study design diagnosis
Implementation: Provide a well documented package that implements sampling and estimation strategies (both at study and meta levels): gsyunyaev.com/hiddenmeta
Following MIDA:
get_study_population()get_study_population()get_study_estimands()| Inquiry | Estimand |
|---|---|
| known_size | 105.000 |
| known_prev | 0.105 |
| hidden_size | 202.000 |
| hidden_prev | 0.202 |
get_study_population()get_study_estimands()sample_rds() / sample_pps() / sample_tls()get_study_population()get_study_estimands()sample_rds() / sample_pps() / sample_tls()get_study_population()get_study_estimands()sample_rds() / sample_pps() / sample_tls()get_study_population()get_study_estimands()sample_rds() / sample_pps() / sample_tls()get_study_est_*()| Sampling | Estimation | Method references | [R] Package |
|---|---|---|---|
| PPS | HT: Horvitz-Thompson estimator of prevalence with re-scaled bootstrap standard errors | Rust and Rao (1996) | Self-coded + surveybootstrap (Feehan and Salganik 2023) |
| TLS | HT: Horvitz-Thompson estimator of prevalence with re-scaled bootstrap standard errors (Rust and Rao 1996) | Rust and Rao (1996) | Self-coded + surveybootstrap (Feehan and Salganik 2023) |
| RDS | SS: Sequential sampling estimator of prevalence | Gile (2011) | RDS (Mark S. Handcock et al. 2023) |
| RDS+/LTS | LINK: Link-tracing prevalence estimator based on snowball sample | Vincent and Thompson (2022) | Self-coded based on Vincent and Thompson (2022) |
| Sampling | Estimation | Method references | [R] Package |
|---|---|---|---|
| PPS | NSUM: Network scale-up estimator of population size with re-scaled bootstrap standard errors | Killworth et al. (1998); Rust and Rao (1996) | networkreporting (Feehan and Salganik 2016) + surveybootstrap (Feehan and Salganik 2023) |
| TLS | NSUM: Network scale-up estimator of population size with re-scaled bootstrap standard errors (Killworth et al. 1998; Rust and Rao 1996) | Killworth et al. (1998); Rust and Rao (1996) | networkreporting (Feehan and Salganik 2016) + surveybootstrap (Feehan and Salganik 2023) |
| TLS | RECAP *: Mark-recapture estimator of population size with parametric standard errors | Hickman et al. (2006) | Rcapture (Baillargeon and Rivest 2007) |
| RDS | SSPSE: Bayesian sequential sampling model of population size | Mark S. Handcock, Gile, and Mar (2014) | sspse (Mark S. Handcock et al. 2022) |
| RDS | CHORDS: Epidemiological model for population size estimation | Berchenko, Rosenblatt, and Frost (2017) | chords (Berchenko, Rosenblatt, and Frost 2017) |
| RDS | MULTIPLIER *: Service-multiplier population size estimator with bootstrap re-sampling standard errors | Hickman et al. (2006); Salganik (2006) | Self-coded + surveybootstrap (Feehan and Salganik 2023) |
| RDS+/LTS | LINK: Link-tracing population size estimator based on snowball sample | Vincent and Thompson (2022) | Self-coded based on Vincent and Thompson (2022) |
| RDS+/LTS | MSE: Link-tracing population size estimator based on multiple link-tracing samples | Vincent and Thompson (2017) | Self-coded based on Vincent and Thompson (2017) |
Implementation: Provide a well documented package that implements sampling and estimation strategies (both at study and meta levels): gsyunyaev.com/hiddenmeta
Running estimation implemented in the hiddenmeta package is straightforward…
hiddenmeta::get_study_est_ss(
data = jhu_pakistan,
sampling_frame = "known1",
hidden_var = "hidden",
total_var = "total",
n_coupons = 3,
prefix = "rds",
label = "rds_ss")
hiddenmeta::get_study_est_ht(
data = umass_tunisia,
hidden_var = "hidden",
survey_design = ~ tls_cluster,
weight_var = "tls_weight",
prefix = "tls",
label = "tls_ht")
Estimates
|
Std. Errors
|
||||||
|---|---|---|---|---|---|---|---|
| Study | Inquiry | Sample | Estimator | Team | Replication | Team | Replication |
| jhu_pakistan | hidden_prev2 | rds | ss | 0.255 | 0.457 | 0.023 | 0.04 |
| jhu_pakistan | hidden_size2 | pps | ht | 27400 | 3208 | ||
| jhu_pakistan | hidden_prev2 | pps | ht | 0.125 | 0.196 | 0.019 | 0.023 |
| umass_tunisia | hidden_size2 | tls | ht | 1383 | 208 | ||
| umass_tunisia | hidden_prev2 | tls | ht | 0.238 | 0.22 | 0.013 | 0.033 |
| umass_tunisia | hidden_size2 | tls | recap | 3148 | 870 | ||
| umass_tunisia | hidden_size2 | tls | mse | 3360 | 1048 | ||
data {
int<lower=0> N;
int<lower=0> K;
int<lower=0> n_ests;
real<lower=0> alpha_prior_mean[N];
real<lower=0> alpha_prior_sd[N];
real<lower=0> bias_prior_mean[K];
real<lower=0> bias_prior_sd[K];
int<lower=0> site[n_ests];
int<lower=0> design[n_ests];
vector<lower=0>[n_ests] ests;
vector<lower=0>[n_ests] ses;
}
parameters { }
model { }data {
int<lower=0> N;
int<lower=0> K;
int<lower=0> n_ests;
real<lower=0> alpha_prior_mean[N];
real<lower=0> alpha_prior_sd[N];
real<lower=0> bias_prior_mean[K];
real<lower=0> bias_prior_sd[K];
int<lower=0> site[n_ests];
int<lower=0> design[n_ests];
vector<lower=0>[n_ests] ests;
vector<lower=0>[n_ests] ses;
}
parameters {
vector<lower=0.01>[K] beta;
vector<lower=0,upper=1>[N] alpha;
}
model { }data {
int<lower=0> N;
int<lower=0> K;
int<lower=0> n_ests;
real<lower=0> alpha_prior_mean[N];
real<lower=0> alpha_prior_sd[N];
real<lower=0> bias_prior_mean[K];
real<lower=0> bias_prior_sd[K];
int<lower=0> site[n_ests];
int<lower=0> design[n_ests];
vector<lower=0>[n_ests] ests;
vector<lower=0>[n_ests] ses;
}
parameters {
vector<lower=0.01>[K] beta;
vector<lower=0,upper=1>[N] alpha;
}
model {
target += normal_lpdf(ests | beta[design].*alpha[site], ses);
alpha ~ normal(alpha_prior_mean, alpha_prior_sd);
beta ~ lognormal(log(beta_prior_mean), beta_prior_sd);
}Note: we assume that the standard errors correctly estimate the standard deviation of the sampling distribution of the estimate (even if the estimate is unbiased)
data {
int<lower=0> N;
int<lower=0> K;
int<lower=0> n_ests;
real<lower=0> alpha_prior_mean[N];
real<lower=0> alpha_prior_sd[N];
real<lower=0> bias_prior_mean[K];
real<lower=0> bias_prior_sd[K];
int<lower=0> site[n_ests];
int<lower=0> design[n_ests];
vector<lower=0>[n_ests] ests;
vector<lower=0>[n_ests] ses;
}
parameters {
vector<lower=0.01>[K] beta;
vector<lower=0,upper=1>[N] alpha;
}
model {
target += normal_lpdf(ests | beta[design].*alpha[site], ses);
alpha ~ normal(alpha_prior_mean, alpha_prior_sd);
beta ~ lognormal(log(beta_prior_mean), beta_prior_sd);
}data {
int<lower=0> N;
int<lower=0> K;
int<lower=0> n_ests;
real<lower=0> alpha_prior_mean[N];
real<lower=0> alpha_prior_sd[N];
real<lower=0> bias_prior_mean[K];
real<lower=0> bias_prior_sd[K];
int<lower=0> site[n_ests];
int<lower=0> design[n_ests];
vector<lower=0>[n_ests] ests;
vector<lower=0>[n_ests] ses;
}
parameters {
vector<lower=0.01>[K] beta;
vector<lower=0,upper=1>[N] alpha;
}
model {
target += normal_lpdf(ests | beta[design].*alpha[site], ses);
alpha ~ normal(alpha_prior_mean, alpha_prior_sd);
beta ~ lognormal(log(beta_prior_mean), beta_prior_sd);
}Distribution of expected bias across respondents
PPS-HT has the tightest distribution centered close to 1Average bias expectation with average lower and upper bounds of credibility intervals
TLS-NSUM was the “average” credibility interval entirely above 1| Study | LTS-HCG | LTS-NE4NS | LTS-NE4NS2 | LTS-VH | PPS-HT | RDS-SS | TLS-HT | TLS-RMARK |
|---|---|---|---|---|---|---|---|---|
| Brazil (FF) | x | |||||||
| Pakistan (JHU) | x | x | ||||||
| Morocco (NORC) | x | x | ||||||
| Costa Rica (NYU) | x | x | x | x | x | |||
| Tanzania (NYU) | x | x | x | x | x | |||
| Tunisia (UMass) | x | x |
Estimated bias vs. bias priors
Instead of assuming that the relative bias of a method \(\beta_{j}\) is constant, we assume that at each site it is a draw from a distribution: \[\beta_{j} \sim f(\beta^*_{j}, \sigma_{j})\] We get to estimate both \(\beta^*_{j}\) – the expected relative bias, and \(\sigma_{j}\): the heterogeneity in relative bias.
Estimated bias comparisons
These correlate. Basic message though is that the heterogeneous model is consistent with strong heterogeneity: We entertain the possibility that every method could be an overestimate or an underestimate.
This model has many parameters for not so many data points \(\Rightarrow\) prior sensitivity
Prevalence estimates from two models
\[ \mathrm{MSE}(\hat\beta) = \mathrm{Var}(\hat\beta) + \mathrm{Bias}(\hat\beta,\beta)^2 \]
| Study | Type of sample | Sample size | Costs (in 1000 USD) |
|---|---|---|---|
| Brazil (FF) | PPS | 1000 | 50 |
| Brazil (FF) | RDS | 600 | 100 |
| Pakistan (JHU) | PPS | 800 | 45 |
| Pakistan (JHU) | RDS | 500 | 65 |
| Morocco (NORC) | LTS/RDS+ | 989 | 350 |
| Morocco (NORC) | TLS/TLS-RECAP | 1067 | 450 |
| Costa Rica (NYU) | PPS | 1017 | 79 |
| Costa Rica (NYU) | LTS/RDS+ | 1009 | 212 |
| Tanzania (NYU) | LTS/RDS+ | 788 | 15 |
| Tanzania (NYU) | PPS | 1052 | 55 |
| Tunisia (UMass) | TLS-RECAP | 1016 | 80 |
| Tunisia (UMass) | TLS | 1029 | 100 |
We are setting up the dashboard so that you can add more data and the meta-analysis updates automatically.
The dream is as you do more studies you use multiple methods and you feed into this “rolling meta-analysis”